{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Loading the Open COVID-19 Dataset\n",
    "This very short notebook showcases how to load the [Open COVID-19 datset](https://github.com/open-covid-19/data), including some examples for commonly performed operations.\n",
    "\n",
    "First, loading the data is very simple with `pandas`. We can use the CSV or the JSON file to download the entire Open COVID-19 dataset in a single step:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": "The dataset currently contains 13229 records, here are the last few:\n"
    },
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": "             Date Key  Confirmed  Deaths CountryCode  \\\n13224  2020-03-29  PR      100.0     3.0          PR   \n13225  2020-03-30  PR      127.0     5.0          PR   \n13226  2020-03-31  PR      174.0     6.0          PR   \n13227  2020-03-31  FK        0.0     0.0          FK   \n13228  2020-03-31  MP        2.0     0.0          MP   \n\n                    CountryName RegionCode RegionName   Latitude   Longitude  \\\n13224               Puerto Rico        NaN        NaN  18.220833  -66.590149   \n13225               Puerto Rico        NaN        NaN  18.220833  -66.590149   \n13226               Puerto Rico        NaN        NaN  18.220833  -66.590149   \n13227          Falkland Islands        NaN        NaN -51.796253  -59.523613   \n13228  Northern Mariana Islands        NaN        NaN  17.330830  145.384690   \n\n       Population  \n13224   2933408.0  \n13225   2933408.0  \n13226   2933408.0  \n13227      3377.0  \n13228     56188.0  ",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>Date</th>\n      <th>Key</th>\n      <th>Confirmed</th>\n      <th>Deaths</th>\n      <th>CountryCode</th>\n      <th>CountryName</th>\n      <th>RegionCode</th>\n      <th>RegionName</th>\n      <th>Latitude</th>\n      <th>Longitude</th>\n      <th>Population</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>13224</th>\n      <td>2020-03-29</td>\n      <td>PR</td>\n      <td>100.0</td>\n      <td>3.0</td>\n      <td>PR</td>\n      <td>Puerto Rico</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>18.220833</td>\n      <td>-66.590149</td>\n      <td>2933408.0</td>\n    </tr>\n    <tr>\n      <th>13225</th>\n      <td>2020-03-30</td>\n      <td>PR</td>\n      <td>127.0</td>\n      <td>5.0</td>\n      <td>PR</td>\n      <td>Puerto Rico</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>18.220833</td>\n      <td>-66.590149</td>\n      <td>2933408.0</td>\n    </tr>\n    <tr>\n      <th>13226</th>\n      <td>2020-03-31</td>\n      <td>PR</td>\n      <td>174.0</td>\n      <td>6.0</td>\n      <td>PR</td>\n      <td>Puerto Rico</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>18.220833</td>\n      <td>-66.590149</td>\n      <td>2933408.0</td>\n    </tr>\n    <tr>\n      <th>13227</th>\n      <td>2020-03-31</td>\n      <td>FK</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>FK</td>\n      <td>Falkland Islands</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>-51.796253</td>\n      <td>-59.523613</td>\n      <td>3377.0</td>\n    </tr>\n    <tr>\n      <th>13228</th>\n      <td>2020-03-31</td>\n      <td>MP</td>\n      <td>2.0</td>\n      <td>0.0</td>\n      <td>MP</td>\n      <td>Northern Mariana Islands</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>17.330830</td>\n      <td>145.384690</td>\n      <td>56188.0</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "metadata": {},
     "execution_count": 1
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "# Load CSV data directly from the URL with pandas\n",
    "data = pd.read_csv('https://open-covid-19.github.io/data/data.csv')\n",
    "\n",
    "# Alternatively load the JSON data, which should be identical\n",
    "data_json = pd.read_json('https://open-covid-19.github.io/data/data.json')\n",
    "assert len(data) == len(data_json)\n",
    "\n",
    "# Print a small snippet of the dataset\n",
    "print('The dataset currently contains %d records, here are the last few:' % len(data))\n",
    "data.tail()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Looking at country-level data\n",
    "Some records contain country-level data, in other words, data that is aggregated at the country level. Other records contain region-level data, which are subdivisions of a country; for example, Chinese provinces or USA states.\n",
    "\n",
    "To filter only country-level data from the dataset, look for records that have a null value for the region:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": "             Date Key  Confirmed  Deaths CountryCode  \\\n13224  2020-03-29  PR      100.0     3.0          PR   \n13225  2020-03-30  PR      127.0     5.0          PR   \n13226  2020-03-31  PR      174.0     6.0          PR   \n13227  2020-03-31  FK        0.0     0.0          FK   \n13228  2020-03-31  MP        2.0     0.0          MP   \n\n                    CountryName   Latitude   Longitude  Population  \n13224               Puerto Rico  18.220833  -66.590149   2933408.0  \n13225               Puerto Rico  18.220833  -66.590149   2933408.0  \n13226               Puerto Rico  18.220833  -66.590149   2933408.0  \n13227          Falkland Islands -51.796253  -59.523613      3377.0  \n13228  Northern Mariana Islands  17.330830  145.384690     56188.0  ",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>Date</th>\n      <th>Key</th>\n      <th>Confirmed</th>\n      <th>Deaths</th>\n      <th>CountryCode</th>\n      <th>CountryName</th>\n      <th>Latitude</th>\n      <th>Longitude</th>\n      <th>Population</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>13224</th>\n      <td>2020-03-29</td>\n      <td>PR</td>\n      <td>100.0</td>\n      <td>3.0</td>\n      <td>PR</td>\n      <td>Puerto Rico</td>\n      <td>18.220833</td>\n      <td>-66.590149</td>\n      <td>2933408.0</td>\n    </tr>\n    <tr>\n      <th>13225</th>\n      <td>2020-03-30</td>\n      <td>PR</td>\n      <td>127.0</td>\n      <td>5.0</td>\n      <td>PR</td>\n      <td>Puerto Rico</td>\n      <td>18.220833</td>\n      <td>-66.590149</td>\n      <td>2933408.0</td>\n    </tr>\n    <tr>\n      <th>13226</th>\n      <td>2020-03-31</td>\n      <td>PR</td>\n      <td>174.0</td>\n      <td>6.0</td>\n      <td>PR</td>\n      <td>Puerto Rico</td>\n      <td>18.220833</td>\n      <td>-66.590149</td>\n      <td>2933408.0</td>\n    </tr>\n    <tr>\n      <th>13227</th>\n      <td>2020-03-31</td>\n      <td>FK</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>FK</td>\n      <td>Falkland Islands</td>\n      <td>-51.796253</td>\n      <td>-59.523613</td>\n      <td>3377.0</td>\n    </tr>\n    <tr>\n      <th>13228</th>\n      <td>2020-03-31</td>\n      <td>MP</td>\n      <td>2.0</td>\n      <td>0.0</td>\n      <td>MP</td>\n      <td>Northern Mariana Islands</td>\n      <td>17.330830</td>\n      <td>145.384690</td>\n      <td>56188.0</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "metadata": {},
     "execution_count": 2
    }
   ],
   "source": [
    "# Look for rows with null RegionCode\n",
    "countries = data[data['RegionCode'].isna()]\n",
    "\n",
    "# We no longer need the region-level columns\n",
    "countries = countries.drop(columns=['RegionCode', 'RegionName'])\n",
    "\n",
    "countries.tail()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Looking at region-level data\n",
    "Conversely, to filter region-level data for a specific country, we need to look for records where the region columns have non-null values. The following snippet extracts data related to Spain's subregions from the dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": "            Date    Key  Confirmed  Deaths CountryCode CountryName RegionCode  \\\n9404  2020-03-27  ES_VC     3532.0   198.0          ES       Spain         VC   \n9405  2020-03-28  ES_VC     4034.0   234.0          ES       Spain         VC   \n9406  2020-03-29  ES_VC     4784.0   267.0          ES       Spain         VC   \n9407  2020-03-30  ES_VC     5110.0   310.0          ES       Spain         VC   \n9408  2020-03-31  ES_VC     5508.0   339.0          ES       Spain         VC   \n\n                RegionName  Latitude  Longitude  Population  \n9404  Comunidad Valenciana   39.4697    -0.3774         NaN  \n9405  Comunidad Valenciana   39.4697    -0.3774         NaN  \n9406  Comunidad Valenciana   39.4697    -0.3774         NaN  \n9407  Comunidad Valenciana   39.4697    -0.3774         NaN  \n9408  Comunidad Valenciana   39.4697    -0.3774         NaN  ",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>Date</th>\n      <th>Key</th>\n      <th>Confirmed</th>\n      <th>Deaths</th>\n      <th>CountryCode</th>\n      <th>CountryName</th>\n      <th>RegionCode</th>\n      <th>RegionName</th>\n      <th>Latitude</th>\n      <th>Longitude</th>\n      <th>Population</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>9404</th>\n      <td>2020-03-27</td>\n      <td>ES_VC</td>\n      <td>3532.0</td>\n      <td>198.0</td>\n      <td>ES</td>\n      <td>Spain</td>\n      <td>VC</td>\n      <td>Comunidad Valenciana</td>\n      <td>39.4697</td>\n      <td>-0.3774</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>9405</th>\n      <td>2020-03-28</td>\n      <td>ES_VC</td>\n      <td>4034.0</td>\n      <td>234.0</td>\n      <td>ES</td>\n      <td>Spain</td>\n      <td>VC</td>\n      <td>Comunidad Valenciana</td>\n      <td>39.4697</td>\n      <td>-0.3774</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>9406</th>\n      <td>2020-03-29</td>\n      <td>ES_VC</td>\n      <td>4784.0</td>\n      <td>267.0</td>\n      <td>ES</td>\n      <td>Spain</td>\n      <td>VC</td>\n      <td>Comunidad Valenciana</td>\n      <td>39.4697</td>\n      <td>-0.3774</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>9407</th>\n      <td>2020-03-30</td>\n      <td>ES_VC</td>\n      <td>5110.0</td>\n      <td>310.0</td>\n      <td>ES</td>\n      <td>Spain</td>\n      <td>VC</td>\n      <td>Comunidad Valenciana</td>\n      <td>39.4697</td>\n      <td>-0.3774</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>9408</th>\n      <td>2020-03-31</td>\n      <td>ES_VC</td>\n      <td>5508.0</td>\n      <td>339.0</td>\n      <td>ES</td>\n      <td>Spain</td>\n      <td>VC</td>\n      <td>Comunidad Valenciana</td>\n      <td>39.4697</td>\n      <td>-0.3774</td>\n      <td>NaN</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "metadata": {},
     "execution_count": 3
    }
   ],
   "source": [
    "# Filter records that have the right country code AND a non-null region code\n",
    "spain_regions = data[(data['CountryCode'] == 'ES') & ~(data['RegionCode'].isna())]\n",
    "\n",
    "spain_regions.tail()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using the `Key` column\n",
    "The `Key` column is present in all datasets and is unique for each country-region combination. This way, we can retrieve a specific country or region using a single filter for the data. The `Key` column is built using `CountryCode` for country-level data, otherwise `${CountryCode}_${RegionCode}`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": "            Date Key  Confirmed  Deaths CountryCode CountryName   Latitude  \\\n5731  2020-03-28  ES    64059.0  4858.0          ES       Spain  40.463667   \n5732  2020-03-29  ES    72248.0  5690.0          ES       Spain  40.463667   \n5733  2020-03-30  ES    78797.0  6528.0          ES       Spain  40.463667   \n5734  2020-03-31  ES    85195.0  7340.0          ES       Spain  40.463667   \n5735  2020-04-01  ES    94417.0  8189.0          ES       Spain  40.463667   \n\n      Longitude  Population  \n5731   -3.74922  46736776.0  \n5732   -3.74922  46736776.0  \n5733   -3.74922  46736776.0  \n5734   -3.74922  46736776.0  \n5735   -3.74922  46736776.0  ",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>Date</th>\n      <th>Key</th>\n      <th>Confirmed</th>\n      <th>Deaths</th>\n      <th>CountryCode</th>\n      <th>CountryName</th>\n      <th>Latitude</th>\n      <th>Longitude</th>\n      <th>Population</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>5731</th>\n      <td>2020-03-28</td>\n      <td>ES</td>\n      <td>64059.0</td>\n      <td>4858.0</td>\n      <td>ES</td>\n      <td>Spain</td>\n      <td>40.463667</td>\n      <td>-3.74922</td>\n      <td>46736776.0</td>\n    </tr>\n    <tr>\n      <th>5732</th>\n      <td>2020-03-29</td>\n      <td>ES</td>\n      <td>72248.0</td>\n      <td>5690.0</td>\n      <td>ES</td>\n      <td>Spain</td>\n      <td>40.463667</td>\n      <td>-3.74922</td>\n      <td>46736776.0</td>\n    </tr>\n    <tr>\n      <th>5733</th>\n      <td>2020-03-30</td>\n      <td>ES</td>\n      <td>78797.0</td>\n      <td>6528.0</td>\n      <td>ES</td>\n      <td>Spain</td>\n      <td>40.463667</td>\n      <td>-3.74922</td>\n      <td>46736776.0</td>\n    </tr>\n    <tr>\n      <th>5734</th>\n      <td>2020-03-31</td>\n      <td>ES</td>\n      <td>85195.0</td>\n      <td>7340.0</td>\n      <td>ES</td>\n      <td>Spain</td>\n      <td>40.463667</td>\n      <td>-3.74922</td>\n      <td>46736776.0</td>\n    </tr>\n    <tr>\n      <th>5735</th>\n      <td>2020-04-01</td>\n      <td>ES</td>\n      <td>94417.0</td>\n      <td>8189.0</td>\n      <td>ES</td>\n      <td>Spain</td>\n      <td>40.463667</td>\n      <td>-3.74922</td>\n      <td>46736776.0</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "metadata": {},
     "execution_count": 4
    }
   ],
   "source": [
    "# Filter records for Spain at the country-level\n",
    "spain_country = data[data['Key'] == 'ES']\n",
    "\n",
    "# We no longer need the region-level columns\n",
    "spain_country = spain_country.drop(columns=['RegionCode', 'RegionName'])\n",
    "\n",
    "spain_country.tail()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": "            Date    Key  Confirmed  Deaths CountryCode CountryName RegionCode  \\\n9259  2020-03-27  ES_MD    19243.0  2412.0          ES       Spain         MD   \n9260  2020-03-28  ES_MD    21520.0  2757.0          ES       Spain         MD   \n9261  2020-03-29  ES_MD    22677.0  3082.0          ES       Spain         MD   \n9262  2020-03-30  ES_MD    24090.0  3392.0          ES       Spain         MD   \n9263  2020-03-31  ES_MD    27509.0  3603.0          ES       Spain         MD   \n\n     RegionName  Latitude  Longitude  Population  \n9259     Madrid   40.4165   -3.70256         NaN  \n9260     Madrid   40.4165   -3.70256         NaN  \n9261     Madrid   40.4165   -3.70256         NaN  \n9262     Madrid   40.4165   -3.70256         NaN  \n9263     Madrid   40.4165   -3.70256         NaN  ",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>Date</th>\n      <th>Key</th>\n      <th>Confirmed</th>\n      <th>Deaths</th>\n      <th>CountryCode</th>\n      <th>CountryName</th>\n      <th>RegionCode</th>\n      <th>RegionName</th>\n      <th>Latitude</th>\n      <th>Longitude</th>\n      <th>Population</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>9259</th>\n      <td>2020-03-27</td>\n      <td>ES_MD</td>\n      <td>19243.0</td>\n      <td>2412.0</td>\n      <td>ES</td>\n      <td>Spain</td>\n      <td>MD</td>\n      <td>Madrid</td>\n      <td>40.4165</td>\n      <td>-3.70256</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>9260</th>\n      <td>2020-03-28</td>\n      <td>ES_MD</td>\n      <td>21520.0</td>\n      <td>2757.0</td>\n      <td>ES</td>\n      <td>Spain</td>\n      <td>MD</td>\n      <td>Madrid</td>\n      <td>40.4165</td>\n      <td>-3.70256</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>9261</th>\n      <td>2020-03-29</td>\n      <td>ES_MD</td>\n      <td>22677.0</td>\n      <td>3082.0</td>\n      <td>ES</td>\n      <td>Spain</td>\n      <td>MD</td>\n      <td>Madrid</td>\n      <td>40.4165</td>\n      <td>-3.70256</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>9262</th>\n      <td>2020-03-30</td>\n      <td>ES_MD</td>\n      <td>24090.0</td>\n      <td>3392.0</td>\n      <td>ES</td>\n      <td>Spain</td>\n      <td>MD</td>\n      <td>Madrid</td>\n      <td>40.4165</td>\n      <td>-3.70256</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>9263</th>\n      <td>2020-03-31</td>\n      <td>ES_MD</td>\n      <td>27509.0</td>\n      <td>3603.0</td>\n      <td>ES</td>\n      <td>Spain</td>\n      <td>MD</td>\n      <td>Madrid</td>\n      <td>40.4165</td>\n      <td>-3.70256</td>\n      <td>NaN</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "metadata": {},
     "execution_count": 5
    }
   ],
   "source": [
    "# Filter records for Madrid, one of the subregions of Spain\n",
    "madrid = data[data['Key'] == 'ES_MD']\n",
    "\n",
    "madrid.tail()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Minimal dataset\n",
    "If the `Key`, `Confirmed` and `Deaths` columns are sufficient for your application, you can get the latest data from the `data_minimal` dataset which only contains those columns essential for data analysis, here's how you would get the same data for Madrid:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": "             Date    Key  Confirmed  Deaths CountryCode CountryName  \\\n11720  2020-03-27  ES_MD    19243.0  2412.0          ES       Spain   \n12057  2020-03-28  ES_MD    21520.0  2757.0          ES       Spain   \n12399  2020-03-29  ES_MD    22677.0  3082.0          ES       Spain   \n12740  2020-03-30  ES_MD    24090.0  3392.0          ES       Spain   \n13060  2020-03-31  ES_MD    27509.0  3603.0          ES       Spain   \n\n      RegionCode RegionName  Latitude  Longitude  Population  \n11720         MD     Madrid   40.4165   -3.70256         NaN  \n12057         MD     Madrid   40.4165   -3.70256         NaN  \n12399         MD     Madrid   40.4165   -3.70256         NaN  \n12740         MD     Madrid   40.4165   -3.70256         NaN  \n13060         MD     Madrid   40.4165   -3.70256         NaN  ",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>Date</th>\n      <th>Key</th>\n      <th>Confirmed</th>\n      <th>Deaths</th>\n      <th>CountryCode</th>\n      <th>CountryName</th>\n      <th>RegionCode</th>\n      <th>RegionName</th>\n      <th>Latitude</th>\n      <th>Longitude</th>\n      <th>Population</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>11720</th>\n      <td>2020-03-27</td>\n      <td>ES_MD</td>\n      <td>19243.0</td>\n      <td>2412.0</td>\n      <td>ES</td>\n      <td>Spain</td>\n      <td>MD</td>\n      <td>Madrid</td>\n      <td>40.4165</td>\n      <td>-3.70256</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>12057</th>\n      <td>2020-03-28</td>\n      <td>ES_MD</td>\n      <td>21520.0</td>\n      <td>2757.0</td>\n      <td>ES</td>\n      <td>Spain</td>\n      <td>MD</td>\n      <td>Madrid</td>\n      <td>40.4165</td>\n      <td>-3.70256</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>12399</th>\n      <td>2020-03-29</td>\n      <td>ES_MD</td>\n      <td>22677.0</td>\n      <td>3082.0</td>\n      <td>ES</td>\n      <td>Spain</td>\n      <td>MD</td>\n      <td>Madrid</td>\n      <td>40.4165</td>\n      <td>-3.70256</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>12740</th>\n      <td>2020-03-30</td>\n      <td>ES_MD</td>\n      <td>24090.0</td>\n      <td>3392.0</td>\n      <td>ES</td>\n      <td>Spain</td>\n      <td>MD</td>\n      <td>Madrid</td>\n      <td>40.4165</td>\n      <td>-3.70256</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>13060</th>\n      <td>2020-03-31</td>\n      <td>ES_MD</td>\n      <td>27509.0</td>\n      <td>3603.0</td>\n      <td>ES</td>\n      <td>Spain</td>\n      <td>MD</td>\n      <td>Madrid</td>\n      <td>40.4165</td>\n      <td>-3.70256</td>\n      <td>NaN</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "metadata": {},
     "execution_count": 6
    }
   ],
   "source": [
    "# Load the minimal dataset\n",
    "minimal = pd.read_csv('https://open-covid-19.github.io/data/data_minimal.csv')\n",
    "\n",
    "# Filter records for Madrid, one of the subregions of Spain\n",
    "madrid = minimal[minimal['Key'] == 'ES_MD']\n",
    "\n",
    "madrid.tail()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Data consistency\n",
    "Often, region-level data and country-level data will come from different sources. This will lead to numbers not adding up exactly, or even date misalignment (the data for the region may be reported sooner or later than the whole country). However, country- and region- level data will *always* be self-consistent"
   ]
  }
 ],
 "metadata": {
  "file_extension": ".py",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6-final"
  },
  "mimetype": "text/x-python",
  "name": "python",
  "npconvert_exporter": "python",
  "pygments_lexer": "ipython3",
  "version": 3
 },
 "nbformat": 4,
 "nbformat_minor": 4
}